Evaluation of Fault Tolerance Latency from Real-Time Application's Perspectives

نویسندگان

  • Hagbae Kim
  • Kang G. Shin
چکیده

The Fault-Tolerance Latency (FTL) deened as the time required by all sequential steps taken to recover from an error is important to the design and evaluation of fault-tolerant computers used in safety-critical real-time control systems. To meet timing constraints or avoid dynamic failure, the latency of any fault-handling policy | that consists of several stages like error detection, fault location and recovery | must not be larger than the Application Required Latency (ARL), which depends upon the controlled process under consideration and its operating environment. We evaluate the FTL while considering various fault-tolerance mechanisms and use the evaluated FTL to check if a fault-handling policy can meet the timing constraint, FTL ARL, for a given real-time application. The FTL is dependent on the underlying fault-handling mechanisms as well as fault behaviors during the application of temporal-redundancy recovery such as instruction retry or program rollback. We investigate all possible fault-handling scenarios and represent FTL with several random and deterministic variables that model the fault behaviors and/or the capability and performance of fault-handling mechanisms. We also present a simple example to demonstrate the application of the evaluated FTL in real-time systems, where an appropriate fault-handling policy is selected to meet the timing requirement with the minimum degree of spatial redundancy. Any opinions, ndings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reeect the view of the funding agencies.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allocation of Real-Time Computations under Fault Tolerance Constraints

Allocation of resources in \next-generation" real-time operating systems requires some important features in addition to those demonstrated by current systems, resulting in an increased complexity of each system. The allocation is closely related to the scheduling, and the two are based on time considerations, rather then on a static priority scheme. The allocation is fault tolerance motivated,...

متن کامل

Replication and Resubmission Based Adaptive Decision for Fault Tolerance in Real Time Cloud Computing: A New Approach

Cloud computing an adoptable technology is the upshot evolution of on demand service in the computing epitome of immense scale distributed computing. With the raising asks and welfares of cloud computing infrastructure, society can take leverage of intensive computing capability services and scalable, virtualized vicinity of cloud computing to carry out real time tasks executed on a remote clou...

متن کامل

Process Load

While many researchers believe that multimedia applications are best managed with hard, real-time scheduling mechanisms, models based on application-level adaptation with relaxed scheduling constraints are gaining acceptance. We analyze an existing video conferencing application that was designed without explicit support for CPU resource management , and propose modifications to its architectur...

متن کامل

Chapter 17 Action-level Fault Tolerance

Action-level (AL) fault tolerance means to accomplish every critical action (output action of a critical task as speciied) successfully in spite of component failures. Therefore, it is aimed for the highest degree of fault tolerance in real-time computer systems. Several basic techniques developed in recent years for realizing action-level fault tolerance in real-time LAN (local area network)-b...

متن کامل

A Low-Latency DMR Architecture with Efficient Recovering Scheme Exploiting Simultaneously Copiable SRAM

This paper presents a novel architecture for a fault-tolerant high-performance system using a checkpoint/restart approach with dual modular redundancy (DMR). The proposed architecture can perform low-latency copy with instantaneously copiable SRAM. Furthermore, we can use an instantaneous comparison scheme that has more fault coverage than comparison with a cyclic redundancy check (CRC). Evalua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEEE Trans. Computers

دوره 49  شماره 

صفحات  -

تاریخ انتشار 2000